36 research outputs found

    Towards Global Explanations for Credit Risk Scoring

    Get PDF
    In this paper we propose a method to obtain global explanations for trained black-box classifiers by sampling their decision function to learn alternative interpretable models. The envisaged approach provides a unified solution to approximate non-linear decision boundaries with simpler classifiers while retaining the original classification accuracy. We use a private residential mortgage default dataset as a use case to illustrate the feasibility of this approach to ensure the decomposability of attributes during pre-processing

    Copying Machine Learning Classifiers

    Get PDF
    We study copying of machine learning classifiers, an agnostic technique to replicate the decision behavior of any classifier. We develop the theory behind the problem of copying, highlighting its properties, and propose a framework to copy the decision behavior of any classifier using no prior knowledge of its parameters or training data distribution. We validate this framework through extensive experiments using data from a series of well-known problems. To further validate this concept, we use three different use cases where desiderata such as interpretability, fairness or productivization constrains need to be addressed. Results show that copies can be exploited to enhance existing solutions and improve them adding new features and characteristics

    Risk mitigation in algorithmic accountability: The role of machine learning copies

    Get PDF
    Machine learning plays an increasingly important role in our society and economy and is already having an impact on our daily life in many different ways. From several perspectives, machine learning is seen as the new engine of productivity and economic growth. It can increase the business efficiency and improve any decision-making process, and of course, spawn the creation of new products and services by using complex machine learning algorithms. In this scenario, the lack of actionable accountability-related guidance is potentially the single most important challenge facing the machine learning community. Machine learning systems are often composed of many parts and ingredients, mixing third party components or software-as-a-service APIs, among others. In this paper we study the role of copies for risk mitigation in such machine learning systems. Formally, a copy can be regarded as an approximated projection operator of a model into a target model hypothesis set. Under the conceptual framework of actionable accountability, we explore the use of copies as a viable alternative in circumstances where models cannot be re-trained, nor enhanced by means of a wrapper. We use a real residential mortgage default dataset as a use case to illustrate the feasibility of this approach

    Differential Replication for Credit Scoring in Regulated Environments

    Get PDF
    Differential replication is a method to adapt existing machine learning solutions to the demands of highly regulated environments by reusing knowledge from one generation to the next. Copying is a technique that allows differential replication by projecting a given classifier onto a new hypothesis space, in circumstances where access to both the original solution and its training data is limited. The resulting model replicates the original decision behavior while displaying new features and characteristics. In this paper, we apply this approach to a use case in the context of credit scoring. We use a private residential mortgage default dataset. We show that differential replication through copying can be exploited to adapt a given solution to the changing demands of a constrained environment such as that of the financial market. In particular, we show how copying can be used to replicate the decision behavior not only of a model, but also of a full pipeline. As a result, we can ensure the decomposability of the attributes used to provide explanations for credit scoring models and reduce the time-to-market delivery of these solution

    Environmental adaptation and differential replication in machine learning

    Get PDF
    When deployed in the wild, machine learning models are usually confronted withan environment that imposes severe constraints. As this environment evolves, so do these constraints.As a result, the feasible set of solutions for the considered need is prone to change in time. We referto this problem as that of environmental adaptation. In this paper, we formalize environmentaladaptation and discuss how it differs from other problems in the literature. We propose solutionsbased on differential replication, a technique where the knowledge acquired by the deployed modelsis reused in specific ways to train more suitable future generations. We discuss different mechanismsto implement differential replications in practice, depending on the considered level of knowledge.Finally, we present seven examples where the problem of environmental adaptation can be solvedthrough differential replication in real-life applications

    On the design of an ECOC-compliant genetic algorithm

    Get PDF
    Genetic Algorithms (GA) have been previously applied to Error-Correcting Output Codes (ECOC) in state-of-the-art works in order to find a suitable coding matrix. Nevertheless, none of the presented techniques directly take into account the properties of the ECOC matrix. As a result the considered search space is unnecessarily large. In this paper, a novel Genetic strategy to optimize the ECOC coding step is presented. This novel strategy redefines the usual crossover and mutation operators in order to take into account the theoretical properties of the ECOC framework. Thus, it reduces the search space and lets the algorithm to converge faster. In addition, a novel operator that is able to enlarge the code in a smart way is introduced. The novel methodology is tested on several UCI datasets and four challenging computer vision problems. Furthermore, the analysis of the results done in terms of performance, code length and number of Support Vectors shows that the optimization process is able to find very efficient codes, in terms of the trade-off between classification performance and the number of classifiers. Finally, classification performance per dichotomizer results shows that the novel proposal is able to obtain similar or even better results while defining a more compact number of dichotomies and SVs compared to state-of-the-art approaches

    Uncertainty-based Rejection Wrappers for Black-box Classifiers

    Get PDF
    Machine Learning as a Service platform is a very sensible choice for practitioners that wantto incorporate machine learning to their products while reducing times and costs. However, to benefit theiradvantages, a method for assessing their performance when applied to a target application is needed. In thiswork, we present a robust uncertainty-based method for evaluating the performance of both probabilistic andcategorical classification black-box models, in particular APIs, that enriches the predictions obtained withan uncertainty score. This uncertainty score enables the detection of inputs with very confident but erroneouspredictions while protecting against out of distribution data points when deploying the model in a productivesetting. We validate the proposal in different natural language processing and computer vision scenarios.Moreover, taking advantage of the computed uncertainty score, we show that one can significantly increasethe robustness and performance of the resulting classification system by rejecting uncertain prediction

    A Survey on Uncertainty Estimation in Deep Learning Classification Systems from a Bayesian Perspective

    Get PDF
    Decision-making based on machine learning systems, especially when this decision-making can affect humanlives, is a subject of maximum interest in the Machine Learning community. It is, therefore, necessary to equipthese systems with a means of estimating uncertainty in the predictions they emit in order to help practition-ers make more informed decisions. In the present work, we introduce the topic of uncertainty estimation, andwe analyze the peculiarities of such estimation when applied to classification systems. We analyze differentmethods that have been designed to provide classification systems based on deep learning with mechanismsfor measuring the uncertainty of their predictions. We will take a look at how this uncertainty can be mod-eled and measured using different approaches, as well as practical considerations of different applications ofuncertainty. Moreover, we review some of the properties that should be borne in mind when developing suchmetrics. All in all, the present survey aims at providing a pragmatic overview of the estimation of uncertaintyin classification systems that can be very useful for both academic research and deep learning practitioners

    Machine Learning Based Surrogate Model for Press Hardening Process of 22MnB5 Sheet Steel Simulation in Industry 4.0

    Full text link
    The digitalization of manufacturing processes offers great potential in quality control, traceability, and the planning and setup of production. In this regard, process simulation is a well-known technology and a key step in the design of manufacturing processes. However, process simulations are computationally and time-expensive, typically beyond the manufacturing-cycle time, severely limiting their usefulness in real-time process control. Machine Learning-based surrogate models can overcome these drawbacks, and offer the possibility to achieve a soft real-time response, which can be potentially developed into full close-loop manufacturing systems, at a computational cost that can be realistically implemented in an industrial setting. This paper explores the novel concept of using a surrogate model to analyze the case of the press hardening of a steel sheet of 22MnB5. This hot sheet metal forming process involves a crucial heat treatment step, directly related to the final part quality. Given its common use in high-responsibility automobile parts, this process is an interesting candidate for digitalization in order to ensure production quality and traceability. A comparison of different data and model training strategies is presented. Finite element simulations for a transient heat transfer analysis are performed with ABAQUS software and they are used for the training data generation to effectively implement a ML-based surrogate model capable of predicting key process outputs for entire batch productions. The resulting final surrogate predicts the behavior and evolution of the most important temperature variables of the process in a wide range of scenarios, with a mean absolute error around 3 °C, but reducing the time four orders of magnitude with respect to the simulations. Moreover, the methodology presented is not only relevant for manufacturing purposes, but can be a technology enabler for advanced systems, such as digital twins and autonomous process control
    corecore